Tentative Exploration on Reinforcement Learning Algorithms for Stochastic Rewards
نویسندگان
چکیده
This paper addresses a way to generate mixed strategies using reinforcement learning algorithms in domains with stochastic rewards. A new algorithm, based on Q-learning model, called TERSQ is introduced. As a difference from other approaches for stochastic scenarios, TERSQ uses a global exploration rate for all the state/actions in the same run. This exploration rate is selected at the beginning of each round, using a probabilistic distribution, which is updated once the run is finished. In this paper we compare TERSQ with similar approaches that use probability distributions depending on state-action pairs. Two experimental scenarios have been considered. First one deals with the problem of learning the optimal way to combine several evolutionary algorithms used simultaneously by a hybrid approach. In the second one, the objective is to learn the best strategy for a set of competing agents in combat-based videogame.
منابع مشابه
Learning Exploration Policies with Models Conference on Automated Learning and Discovery (conald'98)
Reinforcement learning can greatly proot from world models updated by experience and used for computing policies. Fast discovery of near-optimal policies, however, requires to focus on \useful" experiences. Using an additional exploration model, we learn an exploration policy maximizing \exploration rewards" for visits of states that promise information gain. We augment this approach by an exte...
متن کاملLearning exploration strategies in model-based reinforcement learning
Reinforcement learning (RL) is a paradigm for learning sequential decision making tasks. However, typically the user must hand-tune exploration parameters for each different domain and/or algorithm that they are using. In this work, we present an algorithm called leo for learning these exploration strategies on-line. This algorithm makes use of bandit-type algorithms to adaptively select explor...
متن کاملFifth International Conference on Simulation of Adaptive Behavior ( SAB
Model-Based Reinforcement Learning (MBRL) can greatly proot from using world models for estimating the consequences of selecting particular actions: an animat can construct such a model from its experiences and use it for computing rewarding behavior. We study the problem of collecting useful experiences through exploration in stochastic environments. Towards this end we use MBRL to maximize ex...
متن کاملReinforcement Learning mit adaptiver Steuerung von Exploration und Exploitation
Englisch) Using computational models of reinforcement learning (RL), intelligent behavior based on sensorimotor interactions can be learned (Sutton and Barto, 1998). The way of learning is inspired from neurobiology and psychology, where an artificial agent performs actions within its environment that responds with a reward signal describing the action’s utility. Therefore, the natural objectiv...
متن کاملScaling Up Reinforcement Learning through Targeted Exploration
Recent Reinforcement Learning (RL) algorithms, such as RMAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State TArgeted R-MAX (STAR-MAX) that explores a subset of the state space, called the exploration envelop...
متن کامل